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Abstract 

The  collaborative  effort  between  GMR  Research  &:  Technology  and  the  University  of  Wisconsin  - 
Madison  aimed  at  finding  novel  approaches  in  reduced  rate  representation  and  sampling.  The  effort 
concentrated  on  exploring  data-adaptive  techniques  and  non-adaptive  structured  sensing,  as  well  as 
comparing  randomized  projection  based  approaches  to  nonlinear  affine  (NoLAff)  approaches. 

The  approaches  explored  in  this  work  share  a  common  theme  of  improving  upon  purely  random 
encoding.  Adaptive  sampling  utilizes  partial  information  from  previous  observations  to  focus  subsequent 
observations  onto  relevant  signal  components,  and  provides  significant  improvements  in  the  measurement 
signal-to-noise  ratio.  Toeplitz  structured  matrices  are  effective  sensing  structures  that  are  efficient  to 
generate  and  implement  in  practice.  The  acquisition  process  of  NoLAff  sampling  can  be  approximately 
modeled  using  special  deterministic  sensing  matrices,  and  the  inherent  structure  can  be  leveraged  to 
reduce  decoding  from  convex  optimization  to  hypothesis  testing,  which  is  efficient  both  computationally 
and  from  a  data  rate  perspective. 
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Overview 


The  collaborative  effort  between  GMR  Research  &  Technology,  Inc.  and  the  University  of  Wisconsin 
-  Madison  was  aimed  at  exploiting  the  expertise  of  the  teams  in  nonlinear  and  affine  signal  processing  and 
in  compressed  sensing  (CS)  toward  finding  new  approaches  in  reduced  rate  representation  and  sampling  and 
related  areas  of  research.  The  effort  whose  final  results  are  described  here  concentrated  on  exploring  data 
adaptive  techniques,  Toeplitz  structured  sensing,  and  comparing  randomized  projection  based  compressive 
sensing  approaches  to  nonlinear  affine  approaches. 

In  Part  4  we  describe  an  adaptive  approach  to  CS.  The  theory  of  compressed  sensing  shows  that  samples 
in  the  form  of  random  projections  are  optimal  for  recovering  sparse  signals  in  high-dimensional  spaces  (i.e., 
finding  needles  in  haystacks),  provided  the  measurements  are  noiseless.  However,  noise  is  almost  always 
present  in  applications,  and  compressed  sensing  suffers  from  it.  The  signal  to  noise  ratio  per  dimension  using 
random  projections  is  very  poor,  since  sensing  energy  is  equally  distributed  over  all  dimensions.  Consequently, 
the  ability  of  compressed  sensing  to  locate  sparse  components  degrades  significantly  as  noise  increases.  It 
is  possible,  in  principle,  to  improve  performance  by  “shaping”  the  projections  to  focus  sensing  energy  in 
proper  dimensions.  The  main  question  addressed  here  is,  can  projections  be  adaptively  shaped  to  achieve 
this  focusing  effect?  The  answer  is  yes,  and  we  demonstrate  a  simple,  computationally  efficient  procedure 
that  does  so.  This  section  is  essentially  the  conference  paper  we  (R.  M.  Castro,  J.  Haupt,  R.  Nowak,  G. 
Raz)  presented  at  ICASSP  08. 

Part  II  explores  novel  CS  matrices  which  allow  CS  applications  of  high  dimensionality  and  compressed 
excitation  for  system  identification.  The  problem  of  recovering  a  sparse  signal  x  €  Kn  from  a  relatively 
small  number  of  its  observations  of  the  form  y  =  Ax  €  where  A  is  a  known  matrix  and  k  n,  has 
recently  received  a  lot  of  attention  under  the  rubric  of  compressed  sensing  (CS)  and  has  applications  in 
many  areas  of  signal  processing  such  as  data  compression,  image  processing,  dimensionality  reduction,  etc. 
Recent  work  has  established  that  if  A  is  a  random  matrix  with  entries  drawn  independently  from  certain 
probability  distributions  then  exact  recovery  of  x  from  these  observations  can  be  guaranteed  with  high 
probability.  In  this  paper,  we  show  that  Toeplitz-structured  matrices  with  entries  drawn  independently  from 
the  same  distributions  are  also  sufficient  to  recover  x  from  y  with  high  probability,  and  we  compare  t  he 
performance  of  such  matrices  with  that  of  fully  independent  and  identically  distributed  ones.  The  use  of 
Toeplitz  matrices  in  CS  applications  has  several  potential  advantages:  (i)  they  require  the  generation  of  only 
0(n)  independent  random  variables;  (ii)  multiplication  with  Toeplitz  matrices  can  be  efficiently  implemented 
using  fast  Fourier  transform,  resulting  in  faster  acquisition  and  reconstruction  algorithms;  and  (in)  Toeplitz- 
structured  matrices  arise  naturally  in  certain  application  areas  such  as  system  identification.  This  section 
summarizes  results  from  a  conference  paper  we  (W.  U.  Bajwa,  J.  D.  Haupt,  G.  M.  Raz,  S.  J.  Wright,  and 
R.  D.  Nowak)  presented  at  SSP  07,  and  our  recent  refinement  that  appeared  at  CISS  08. 

Finally,  Part  III  discusses  comparisons  of  randomized  projection  based  approaches  to  compressive  sensing 
to  the  deterministic  nonlinear  affine  approach.  While  NoLAff  does  not  strictly  speaking  use  a  sensing  matrix 
it  nonetheless  can  be  shown  to  have  nearly  equivalent  encoding  structures  that  can  be  described  in  the  quasi 
linear  approximation  cases  as  a  deterministic  sensing  matrix.  This  approach  in  particular  allows  us  to  move 
away  from  the  convex  optimization  decoding  approaches  to  a  hypothesis  testing  approach  which  has  been 
shown  to  be  highly  efficient  from  a  data  rate  perspective;  essentially  allowing  innovations  rate  sampling. 
The  sensing  matrix  equivalent  in  NoLAff  allows  the  encoder  to  retain  some  of  the  orthogonality  between 
signal  subspaces  of  interest  and  hence  allows  us  to  have  both  computationally  efficient  and  data  rate  efficient 
compressive  sensing. 
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Part  I 

Finding  Needles  in  Noisy  Haystacks 

2  Introduction 

Surprising  mathematical  findings  and  stunning  practical  results  have  propelled  compressed  sensing  into  the 
signal  processing  limelight  and  have  had  a  profound  effect  on  our  understanding  of  signal  acquisition  and 
sampling.  Consider  a  signal  that  can  be  represented  (exactly  or  approximately)  by  a  sparse  representation 
(the  superposition  of  a  small  number  of  basis  vectors).  The  basic  idea  of  compressed  sensing  is  that  if  one 
t  akes  samples  in  the  form  of  projections  of  the  signal  and  if  these  projections  are  incoherent  with  the  basis 
vectors,  then  the  sparse  representation  can  be  recovered  from  a  small  number  of  such  samples  (roughly 
proportional  to  the  number  of  components  in  the  sparse  representation)  provided  the  observations  are  noise- 
free  [2,4].  In  addition,  compressed  sensing  remains  stable  in  the  presence  of  random  noise;  i.e.,  the  recovery 
degrades  gracefully,  but  markedly,  as  the  noise  level  is  increased  [3,4].  This  paper  investigates  the  noise 
sensitivity  phenomenon  and  proposes  an  improved  approach  based  on  adaptive  sensing. 

Incoherence  between  the  projection  vectors  and  the  signal  basis  vectors  is  essential  to  compressed  sensing, 
and  is  required  for  successful  recovery  from  a  small  number  of  non-adaptive  samples.  The  incoherence 
condition  guarantees  that  one  “spreads”  the  sensing  energy  over  all  the  dimensions  of  the  coordinate  system 
of  the  basis.  In  essence,  each  compressive  sample  deposits  an  equal  fraction  of  sensing  energy  in  every 
dimension,  making  it  possible  to  locate  the  sparse  components  without  sensing  directly  in  each  and  every 
dimension,  which  would  require  a  number  of  samples  equal  to  the  length  of  the  signal.  When  the  observations 
are  corrupted  by  noise,  however,  the  signal  to  noise  ratio  (SNR)  per  dimension  is  necessarily  much  lower 
using  this  approach  than  if  we  had  used  all  sensing  energy  to  probe  a  single  coordinate.  Thus,  noise  can 
make  the  recovery  of  the  sparse  components  much  more  difficult. 

It  is  intuitively  clear  that  focused  samples  can  be  tremendously  helpful.  Indeed,  if  a  genie  were  to  provide 
the  locations  of  the  sparse  signal  components  a  priori,  then  we  would  know  that  the  optimal  samples  would 
be  projections  on  to  the  corresponding  basis  vectors  themselves,  maximizing  the  SNR  per  sample.  Without 
a  genie,  it  is  sensible  to  attempt  to  recover  the  locations  directly  so  that  subsequent  samples  can  be  focused 
into  the  correct  subspace.  The  potential  advantages  of  an  adaptive  projection  scheme  are  demonstrated 
in  [5],  but  this  procedure  does  not  scale  well  with  problem  dimension.  Here  we  propose  a  different  adaptive 
strategy  for  which  the  shaping  of  the  projections  can  be  computed  in  time  linear  in  the  length  of  the  signal, 
and  therefore  is  no  more  computationally  demanding  than  standard  compressed  sensing.  Begin  with  an 
incoherent  projection  sample,  which  should  provide  a  crude  indication  of  potential  locations  for  the  sparse 
components.  Now,  use  this  information  to  shape  the  next  projection  so  that  it  is  a  bit  less  incoherent  and  a 
bit  more  focused  on  these  potential  locations.  Repeat  this  procedure  until  the  projections  are  mostly  focused 
on  one  location,  which  hopefully  corresponds  to  an  actual  signal  component.  Keep  iterating  this  process, 
with  the  previously  identified  components  removed,  until  no  additional  significant  components  are  found. 

The  remainder  of  the  paper  is  organized  as  follows.  A  brief  review  of  traditional  (non-adaptive)  com¬ 
pressive  sensing  is  given  in  Section  3.  In  Section  4  we  describe  our  strategy  for  projection  focusing  that  is 
based  on  a  general-purpose  Bayesian  model  for  sparse  components  and  an  (approximate)  entropy- maximizing 
projection  shaping  at  each  step.  Computational  experiments  in  Section  5  demonstrate  that  significant  per¬ 
formance  gains  are  possible  through  this  adaptive  procedure,  especially  when  the  signal  is  very  sparse  and 
the  SNR  per  dimension  is  low.  Finally,  some  conclusions  are  discussed  in  Section  11. 


3  Compressive  Sensing  Review 

Compressive  sensing  (CS)  describes  a  collection  of  methods  by  which  sparse  high-dimensional  signals  can 
be  accurately  and  efficiently  recovered  from  a  small  (relative  to  the  dimension)  number  of  observations.  CS 
employs  a  sampling  model  which  is  a  natural  generalization  of  conventional  point  sampling.  Each  observation 
of  an  m-sparse  vector  x  e  Rn  is  described  by 

Y(t)  =  <p(tfx+W(t ),  (1) 
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for  f  =  1,2,...,  k ,  where  the  sampling  vector  0(f )  €  Rn  is  chosen  by  and  known  to  the  observer  and  satisfies 
||</>(f) H2  =  1,  and  W(t)  ^  J\f  (0 ,  <r£,)  is  independent  of  0(f). 

The  earliest  contributions  to  CS  considered  noiseless  settings  where  the  sampling  vectors  {0(f)}f=1  were 
a  collection  of  random  vectors  whose  entries  were  drawn  independently  according  to  some  distribution  (e.g.% 
Gaussian).  In  these  settings,  it  was  shown  that  Basis  Pursuit  (identifying  the  vector  with  minimum  ( \  norm1 
that  agrees  with  the  observations)  efficiently  recovers  any  ra-sparse  signal  with  overwhelming  probability, 
provided  the  number  of  observations  satisfies  k  >  Cm  log  n  where  C  is  some  constant  that  does  not  depend 
on  the  problem  dimension  [2,4].  In  practice,  it  has  been  observed  that  between  3m  and  5m  samples  often 
suffice. 

In  settings  where  sampling  noise  is  present,  the  provable  performance  of  CS  degrades  markedly.  The 
Basis  Pursuit  approach  does  not  apply  directly  in  this  setting,  and  one  possible  estimation  strategy  is  to 
minimize  the  weighted  sum  of  a  squared  error  term  and  a  complexity  term,  given  by 

xk  =  arg  min  h|y  -  <£g||^  +  r||g||i,  (2) 

ge  R"  1 


where  y  is  a  vector  of  the  observations  {t/(f  )}£_!,  is  a  matrix  with  rows  given  by  the  corresponding 
0(f),  and  t  is  an  appropriate  tolerance.  Other  similar  strategies  have  been  proposed  and  analyzed,  yielding 
estimates  that  satisfy 


'Pfc  -aii2 

n 


(3) 


where  C  is  a  constant  that  depends  on  the  noise  power,  and  the  expectation  is  over  the  distribution  of  the 
noise  and  the  projection  vectors  [3,4].  It  is  interesting  to  note  that  this  bound  is  meaningful  only  when  the 
number  of  observations  is  at  least  O(ralogn).  This  is  similar  to  the  number  of  observations  required  in  the 
noise-free  setting  -  the  difference  here  is  that  the  error  decays  relatively  slowly  after  this  point. 


4  Adaptive  Projections  for  Sparse  Recovery 

In  this  section  we  present  an  adaptive  projection  algorithm  targeting  problems  where  the  signal  is  very 
sparse  ( e.g .,  described  by  a  small  number  of  components).  The  proposed  approach  consists  of  a  greedy 
procedure  that  attempts  to  recover  the  signal  sequentially,  component- by-component,  and  is  inspired  by  our 
earlier  work  [6]  where  we  considered  a  parametric  model.  In  this  work  we  use  a  related  model  for  which  it  is 
easy  to  use  a  Bayesian  approach  to  estimate  the  parameters.  In  [6]  this  is  done  using  non-adaptive  random 
projections.  Here  we  propose  a  technique  to  adapt  the  projections  based  on  previous  observations,  in  order 
to  significantly  improve  the  estimation  performance.  We  first  describe  our  methodology  when  the  signal  has 
a  single  non-zero  component,  and  later  we  generalize  this  approach  for  sparse  signals  with  multiple  non-zero 
components. 

4.1  A  Single  Needle  in  the  Haystack 

Let  x  e  Rn,  n  G  N  be  a  vector  with  at  most  one  non-zero  entry.  The  adaptive  projection  procedure 
proposed  follows  a  Bayesian  style  approach,  and  so  we  have  a  generative  model  for  the  signal  x.  Let  t  index 
the  sequential  sampling  process.  At  step  f,  define  the  random  variable  L(f)  G  {l,...,n},  with  probability 
mass  function  p,  (f)  =  Pr(L(f)  =  i).  That  is,  L(f)  is  a  discrete  random  variable  over  the  indices  of  the  signal, 
modeling  that  entry  i  is  nonzero  with  probability  Pi(t).  Conditional  on  the  value  of  L(t )  the  amplitude  of 
the  non-zero  signal  component  is  modeled  as  a  Gaussian  random  variable,  A(t)\L(t)  =  i  ~  «?(*))■ 

Thus,  our  model  has  the  form 

X(f)  =  (0,...,0,A(f),0...,0), 

where  only  t  he  entry  L(t)  of  X(t)  is  non-zero.  We  assume  a:  is  a  realization  of  random  variable  X(t).  Notice 

that  the  distribution  is  parameterized  by  three  quantities:  p(f)  =  (pi(f) . pn(f)),  M(0  =  (pi(f) . /in (/.)), 

and  cr2(t)  =  (af  (t <rjj(f)).  Initially,  when  f  =  0  and  no  samples  have  been  taken,  we  start  with 

lrThe  norm  is  defined  by  ||x||i  =  \xi\,  where  xt  is  the  ith  component  of  x. 
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a  uniform  prior  on  the  location,  and  zero  mean  distribution  for  the  conditional  amplitude,  specifically 
p(0)  =  (1/n, . . . ,  1/n),  /i(0)  =  (0, ...  ,0)  and  <r2(0)  =  (ctq,  . . .  ,&q),  where  ctq  >  0.  This  prior  distribution 
is  updated  in  a  Bayesian  manner  as  samples  are  acquired,  giving  rise  to  the  model  at  step  t ,  as  described 
above. 

Recall  the  observation  model  in  (1).  Using  Bayes  rule  we  can  update  the  posterior  distribution,  and 
straightforward  calculations  yield  the  following  update  rules 


Pi(*+1)  = 

<rf(t+l)  = 


Pi(t  +  1)  oc 


4>i{t)cr?{t)y{t) 

Mi)  i 

+  * ’I  v  2  4^(t)of(t)  +  oi  )  ’ 


where  y(t)  is  a  realization  of  Y(t),  and  in  the  update  of  p(t  +  1)  we  omit  the  explicit  expression  of  the 
normalization  constant. 

The  choice  of  the  projection  vectors  </>(£)  is  critical  for  good  performance.  If  we  are  constrained  not  to  use 
adaptive  projections  it  is  known  that  random  projections  are  as  uniformly  informative  as  possible.  These 
can  be,  for  example,  Rademacher  random  vectors  (?i-vectors  comprised  of  i.i.d.  random  variables  taking 
values  ±1  / y/n  with  equal  probability).  However,  if  that  constraint  is  removed  and  adaptivity  is  allowed, 
then  one  can  use  information  gleaned  from  previous  samples  to  “focus”  the  projection  vectors,  leading  to 
better  performance. 

We  propose  the  following  methodology:  define  the  “shaped”  random  projection 


+  1)  =  (\/pi(t)Bi,  \fp2(t)B2,  •  •  • ,  \Zpn(t)Bn) 

where  {£*}  are  i.i.d.  random  variables,  taking  value  ±1  with  equal  probability.  Note  that  since  ]T'l=1  Pi(t)  = 
1  (because  p  is  a  discrete  probability  distribution)  we  have  ||</>(0I|2  =  1.  If  at.  time  t  we  are  very  confident 
that  i  is  the  only  non-zero  entry  of  x,  that  is  pi(t)  is  close  to  1,  then  the  shaped  projection  vector  is  going 
to  put  a  large  amount  of  mass  on  that  entry.  While  this  may  appear  intuitively  reasonable,  there  is  also  a 
principled  rationale  for  this  particular  shaping  procedure,  namely  it  is  an  attempt  to  make  observation  Y(t) 
as  informative  as  possible. 

A  way  of  characterizing  the  information  content  of  Y(t)  is  to  compute  its  differential  entropy,  as  defined 
in  [7].  In  other  words  we  want  to  find  </>(£  -I-  1)  solving 

argmax  H(hT X(t)  +  W(£  +  1))  ,  (4) 

h:\\h\h  =  l 

where  //(•)  is  the  differential  entropy  and  X(t)  is  a  random  variable  distributed  according  our  generative 
model  at  step  t.  In  other  words  X(t)  reflects  our  knowledge  of  x  at  time  t.  Now  note  that  under  our  model 
h1  X(t)  is  distributed  as  a  Gaussian  mixture  with  n  components  (recall  that  at  most  one  entry  of  X(t)  is 
non-zero).  In  particular  the  density  of  h1  X(t)  is 


£ 


Pi(t) 


\j2Trh]  a} 


exp 


/  (x  -  htm(t))2\ 

V  /  ' 


There  is  no  closed  form  expression  for  the  differential  entropy  of  a  Gaussian  mixture.  Instead,  using  the  fact 
that  the  conditional  differential  entropy  is  a  lower  bound  for  the  differential  entropy  [7],  and  conditioning  on 
the  selection  of  the  mixture  component,  we  obtain 


H(h? 


X(t))>  -log 


27re 


(t) 


1=1 
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Replacing  the  entropy  in  (4)  by  the  lower  bound  yields 

4>(t  +  1 )  =  arginax  ^  log  (  27reTT(/^(<))p'(0 

MIMh=i  2  \  i=, 

n 

=  arg  max  pi(t)  log(/i2)  . 

It  is  easily  shown  that  &i(t  -I-  1)  =  ±y/pi(t.),  which  motivates  our  choice  of  projection  vectors. 

When  a  budget  of  k  projective  observations  is  allowed  one  can  use  the  above  algorithm  to  collect  all  the 
observations,  and  the  final  estimate  can  be  computed  from  the  posterior  (different  estimates  should  be  used, 
to  minimize  the  desired  cost  function).  If  optimizing  mean  squared  error,  then  the  best  estimate  is  simply 

Xk  =  (pi(k)pi(k),  .  .  .  1  Pn(k)pn(k))- 

4.2  Multiple  Needles  in  the  Haystack 

Here  we  describe  a  modification  of  the  procedure  above  when  multiple  entries  of  the  signal  are  active  (z.e., 
x  might  have  more  than  a  single  non- zero  entry).  The  idea  is  to  search  for  the  significant  entries  of  x  one 
at  the  time,  using  the  previously  developed  method.  Once  an  entry  is  found,  no  more  observation  energy  is 
allocated  to  it.  As  time  proceeds  one  gets  closer  to  the  single  needle  model. 

The  procedure  starts  exactly  as  in  the  single  spike  case,  and  proceeds  until  one  entry  of  p(t)  exceeds  a 
threshold,  say  0.9.  As  this  point  we  infer  there  is  significant  signal  value  in  the  corresponding  location,  and 
proceed  by  measuring  that  entry  directly  using  a  projection  vector  that  is  just  a  singleton.  The  observed 
value  becomes  our  estimate  for  the  signal  value  at  that  location.  We  then  restart  the  entire  estimation 
procedure,  but  zero-out  in  p(t  +  1)  the  entry  that  we  just  measured.  All  the  other  entries  of  p(t  +  1)  are 
equal  (uniform  prior).  The  procedure  is  iterated  until  the  observation  budget  is  expended.  Unlike  in  the 
single  needle  model  it  is  important  to  measure  each  detected  entry  directly  because  model  mismatch  often 
makes  the  estimates  obtained  directly  from  the  algorithm  inaccurate. 


5  Experimental  Comparison 

In  this  section  we  demonstrate  the  benefits  of  our  proposed  adaptive  procedure  relative  to  traditional  random 
projections  in  several  recovery  tasks.  First,  we  show  that  our  adaptive  procedure  can  identify  true  signal 
components  much  more  effectively  than  orthogonal  matching  pursuit  (OMP)  [8]  applied  to  standard  (non- 
adaptive)  random  projection  observations.  To  achieve  comparable  performance,  OMP  requires  as  many  as 
15-30  times  as  many  observations  as  the  adaptive  procedure.  Second,  we  demonstrate  that  our  adaptive 
sampling  procedure  often  yields  lower  average  reconstruction  errors  than  standard  random  projections,  and 
the  benefit  becomes  more  pronounced  as  the  noise  power  increases.  For  all  experiments,  we  considered  target 
signals  x  e  IRn,  n  =  213,  with  m  —  15  nonzero  entries  of  the  same  amplitude  (with  random  signs)  at  random 
locations,  and  we  enforced  ||x||2  =  1.  Noise  power  is  quantified  by  the  SNR,  S  =  ||x||2/n(72 . 

5.1  Support  Identification 

First  we  demonst  rate  the  effectiveness  of  the  adaptive  procedure  in  support  identification.  For  a  fixed  SNH, 
we  generated  a  target  signal  as  above  and  ran  the  adaptive  procedure  until  one  of  the  entries  of  the  posterior 
probability  vector  exceeded  0.9.  The  required  number  of  observations  (k')  was  recorded,  along  with  t  he  index 
of  the  maximum  of  the  posterior  vector  (the  estimate  of  the  support).  For  comparison  we  obtained  support 
estimates  using  one  index-selection  step  of  OMP2  applied  to  collections  of  non-adaptive  random  projection 
observations  (using  n-vectors  with  i.i.d.  ±1  / y/n  entries).  The  number  of  non-adaptive  observations  for  each 
of  the  OMP  trials  was  a  multiple  of  k'.  Each  experiment  was  termed  a  success  if  the  support  estimate 
contained  the  index  of  at  least  one  true  signal  component.  The  average  number  of  observations  required 

2The  OMP  index-selection  step  identifies  the  index  i  (or  indices,  in  the  case  of  a  tie)  for  which  |r,|  =  max,  |rd  =  ||r||oo, 
where  r  —  <£ 1  y. 
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Table  1:  Empirical  probabilities  of  successful  support  identification  for  the  adaptive  procedure  and  standard 
random  projections  (using  one  step  of  OMP).  For  high  noise  levels  (small  S),  more  than  15  times  as  many 


s 

10 

5.0 

2.0 

1.5 

1.0 

0.9 

0.8 

0.5 

0.3 

0.1 

Average  k' 

16.46 

17.09 

20.23 

21.84 

26.56 

27.79 

30.01 

39.94 

58.46 

153.9 

/^(Adaptive,  k1) 

0.989 

0.985 

0.960 

0.963 

0.952 

0.953 

0.969 

0.977 

0.978 

0.995 

P,(OMP.  k’) 

0.018 

0.020 

0.016 

0.015 

0.030 

0.021 

0.022 

0.025 

0.030 

0.028 

Ps(OMP,5Jt') 

0.485 

0.412 

0.412 

0.379 

0.392 

0.397 

0.387 

0.384 

0.386 

0.419 

P,(OMP,  10fc') 

0.944 

0.927 

0.856 

0.860 

0.836 

0.808 

0.812 

0.774 

0.761 

0.783 

P,(  OMP,  15P) 

0.993 

0.994 

0.982 

0.981 

0.967 

0.966 

0.962 

0.938 

0.910 

0.891 

P„(OMP,  30P) 

1 .000 

1.000 

1.000 

1.000 

0.998 

1.000 

1.000 

0.998 

0.994 

0.993 

(Average  k')  for  one  step  of  the  adaptive  procedure  and  the  empirical  probabilities  of  success  (Ps)  for  each 
setting  were  determined  by  averaging  over  1000  trials. 

The  results  are  given  in  Table  1.  We  see  that  adaptive  sampling  clearly  outperforms  random  sampling, 
and  in  some  cases  up  to  30  times  as  many  random  samples  are  required  to  achieve  the  detection  performance 
of  the  adaptive  method.  It  is  also  interesting  to  note  that  the  adaptive  procedure  consistently  identified 
true  components  of  the  signal  with  less  than  5%  error  for  each  SNR  considered.  The  increasing  noise  power 
essentially  affected  only  the  number  of  observations  needed  for  the  algorit  hm  to  converge  to  a  true  component. 

5.2  Signal  Reconstruction 

Next  we  demonstrate  the  advantage  of  adaptive  samples  over  random  projections  for  signal  reconstruction. 
To  ascertain  the  effectiveness  of  the  sampling  procedure  (independent  of  the  reconstruction  algorithm)  we 
reconstruct  in  each  case  using  (2)  followed  by  debiasing.  In  addition,  we  eliminated  the  dependence  of  (2)  on 
the  regularization  parameter  by  clairvoyantly  selecting  the  value  that  gave  the  reconstruction  with  the  lowest 
mean-square  error  (MSE).  We  used  the  GPSR  (Gradient  Projection  for  Sparse  Reconstruction)  software  [9] 
to  efficiently  perform  the  optimization. 

Fixing  the  number  of  observations  k,  we  ran  each  sampling  procedure  to  obtain  the  associated  sampling 
matrices  and  observation  vectors.  Estimates  x  =  x(a)  were  obtained  for  41  distinct  values  of  r.  given  by 
t  =  a||*'y||  OO*  where  a  ranged  from  0  to  1  uniformly  in  increments  of  0.025,  and  for  each  estimate  the 
mean-square  error  ||x(a)  -  x\\\  was  computed.3  The  error  associated  with  a  given  sampling  procedure  was 
chosen  to  be  the  minimum  error  achieved  over  all  tested  values  of  a.  This  entire  procedure  was  performed  40 
times  for  each  value  of  A\  and  the  resulting  minimum  MSE’s  were  averaged.  The  results  of  this  experiment 
for  two  different  noise  levels  ( S  =  10  and  S  =  1.0)  are  shown  in  Fig.  1(a)  and  (b),  respectively. 

The  data  in  Table  1  suggest  that  the  adaptive  procedure  sequentially  identifies  true  components  of  the 
signal,  and  the  number  of  observations  for  each  discovery  depends  on  the  SNR.  Thus,  it  is  natural  to  predict 
that  the  reconstruction  error  of  the  adaptive  procedure  will  qualitatively  match  the  best  approximation  error 
of  the  target  signal.  Since  all  of  the  nonzero  entries  have  the  same  amplitude,  the  (noise-free)  approximation 
error  will  decay  linearly  in  the  number  of  components  that  are  identified  retaining  T  components  gives 
a  squared  approximation  error  of  1  —  T(l/m).  For  the  low  noise  setting  simulated  in  Fig.  1(a),  the  data 
in  Table  1  suggest  that  one  true  signal  component  is  identified  for  every  16.5  observations,  resulting  in  a 
predicted  MSE  of  1  —  (fc/16.5)(l/m)  and  full  signal  recovery  after  ( 16.5)(  15)  «  250  observations.  This  agrees 
with  the  observed  behavior  except  that  as  the  SNR  decreases,  the  slope  of  the  error  decay  changes  with 
the  instantaneous  SNR,  explaining  the  “flattening”  of  the  curve.  The  same  behavior  is  exhibited  in  the 
higher- noise  setting. 

The  reconstruction  errors  using  random  projections  exhibit  a  different  behavior.  When  the  SNR  is 
high  the  performance  is  well-predicted  by  noiseless  CS  results  -  the  reconstruction  error  decays  to  zero 
exponentially  in  the  number  of  observations,  provided  enough  observations  are  collected  to  ensure  that 
certain  submatrices  of  the  observation  matrix  are  well-conditioned.  This  explains  the  transitional  error 

3 As  noted  in  [9],  choosing  r  =  ||^>/  y||oo  guarantees  an  all-zero  solution  while  r  =  0  gives  the  least-squares  solution,  so  this 
parametrization  covers  the  entire  usable  range  of  parameter  values. 
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Figure  1 :  MSE  comparisons  between  reconstructions  obtained  from  adaptive  samples  and  random  projections 
(solid  and  dashed  lines ,  respectively)  for  S  =  10  and  S  =  1.0. 

behavior  for  traditional  compressed  sensing  that  is  apparent  in  Fig.  1(a).  As  the  noise  level  increases,  the 
rate  of  error  decay  becomes  only  polynomial  in  the  number  of  observations  (see  (3)).  It  is  also  interesting 
to  note  that  when  the  number  of  observations  is  less  than  about  50  in  Fig.  1(a)  and  100  in  Fig.  1(b),  the 
adaptive  procedure  succeeds  at  identifying  some  of  the  true  signal  components  while  the  best  reconstructions 
using  random  projections  have  MSE  comparable  to  the  all-zero  solution. 

6  Conclusions  and  Open  Problems 

This  paper  presented  a  novel  adaptive  scheme  for  compressive  sensing  and  demonstrated  that  it  improves 
performance  in  many  situations  compared  to  noil-adaptive  random  projection  methods,  providing  evidence 
that  while  non-adaptive  random  projections  are  effective  in  noiseless  situations,  adaptivity  can  be  very 
helpful  in  real-world  problems.  We  compared  our  approach  with  the  adaptive  projection  method  of  [5],  and 
although  the  performance  of  the  latter  is  competitive,  it  is  only  computationally  feasible  for  relatively  small 
problem  sizes,  making  it  intractable  for  the  settings  considered  in  this  paper.  Currently,  we  are  investigating 
methodologies  with  provable  performance,  in  the  spirit  of  [6],  which  also  provides  evidence  that  adaptive 
sampling  can  outperform  compressed  sensing  in  noisy  conditions. 
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Part  II 

Toeplitz-Structured  Compressed  Sensing 
Matrices 


7  Introduction 


7.1  Background 

We  begin  by  revisiting  the  problem  of  recovering  a  signal  from  linear  observations  of  the  form 

V  =  Ax  :  ||:r||o  <  m,  (5) 

where  ||  •  ||o  counts  the  number  of  non-zero  entries  in  a  vector,  and  A  €  Rkxn  is  a  known  matrix.  Of  particular 
interest  is  the  special  case  of  highly  underdetermined  system,  k  n,  that  has  applications  in  many  areas  of  signal 
processing  such  as  data  compression,  image  processing,  dimensionality  reduction  etc.  and  has  recently  received  a  lot 
of  attention  under  the  rubric  of  compressed  sensing  (CS)  -  starting  in  particular  with  some  of  the  earlier  works  of 
Candes,  Romberg  and  Tao  [1-3]  and  Donoho  [4]. 

One  of  the  fundamental  problems  in  CS  is  to  identify  the  observation  matrices  that  are  sufficient  to  ensure  exact 
recovery  of  x  from  y\  we  term  such  matrices  as  the  CS  matrices.  Independently,  Donoho  [4],  and  Candes  and  Tao  [1,3] 
have  provided  sufficient  conditions  for  CS  matrices.  In  particular,  it  was  established  in  [3]  (and  refined  in  [1])  that 
for  a  k  x  n  observation  matrix  A  to  be  a  CS  matrix,  it  is  sufficient  that  it  satisfies  restricted  isometry  property  (RIP) 
of  order  3m  in  the  following  sense:  let  T  C  {1, 2, . . .  ,n}  and  At  be  the  k  x  |T|  submatrix  obtained  by  retaining  the 
columns  of  A  corresponding  to  the  indices  in  T;  then,  there  exists  a  constant  63m  6  (0, 1/3)  such  that 

v  2  e  R|T|,  (l-<53m)||2||i<||^T2||l<(l+<53m)||2||l  (6) 

holds  for  all  subsets  T  with  |T|  <  3m.4  Moreover,  it  was  also  shown  in  [1]  that  x  can  be  exactly  recovered  in  that 
case  by  the  convex  program 


x  =  arg 


mm 

ueR" 


MIi 


subject  to 


y 


(7) 


which  is  attractive  because  it  can  be  solved  in  a  computationally  tractable  manner  using  linear  programming  and 
convex  optimization  techniques  -  see,  e.g.,  [1,4,5].  Note  that  the  RIP  of  order  3ra  is  equivalent  to  saying  that  the 


singular  values  of  all  k  x  3m  submatrices  of  A  lie  in 


the  interval  ^2/3,  -y/ 4/3^ . 


And  while  the  definition  of  RIP  does 


not  guarantee  the  existence  of  CS  matrices,  recent  work  has  shown  that  (appropriately  scaled)  random  matrices  with 
entries  drawn  independently  from  certain  probability  distributions  satisfy  RIP  of  order  3m  with  high  probability  for 
every  63m  E  (0, 1/3)  provided  k  >  const  •  mln(n/m)  -  see,  e.g.,  [1,3, 4, 6];  we  refer  to  such  matrices  as  independent 
and  identically  distributed  (I ID)  CS  matrices. 


7.2  Contribution 


We  show  here  that  a  k  x  n  (partial)  Toeplitz  matrix  A  of  the  form 


On 

On  - 1 

02 

a  1 

Q>n+\ 

On 

•  O3 

02 

man  +  k-  1 

Un+k- 2 

ak_ 

where  the  entries  {ai}^k~l  are  independent  ±\/\/k  each  with  probability  1/2,  is  also  a  CS  matrix  in  the  sense  that  it 
satisfies  RIP  of  order  3m  with  high  probability  for  every  63m  E  (0, 1/3)  provided  k  >  const  in2  ln(n).  Essentially,  the 
reduction  in  the  number  of  degrees  of  freedom  (DoF)  of  a  Toeplitz  random  matrix  seems  to  result  in  an  increase  in  the 
required  number  of  observations.  Note,  however,  that  the  result  established  in  this  paper  is  a  sufficient  condition  for 


4 This  is  a  slightly  weaker  version  of  the  sufficient  condition  originally  given  by  Candes  and  Tao;  for  the  sake  of  brevity, 
however,  and  because  it  suffices  to  illustrate  the  principles,  we  limit  ourselves  to  this  condition  and  refer  the  reader  to  [1,3]  for 
further  details. 
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exact  recovery  of  all  m-sparse  signals,  and  simulation  results  show  that  actual  performance  of  Toeplitz  CS  matrices 
tends  to  be  comparable  to  that  of  IID  CS  matrices  for  many ,  if  not  all,  such  signals.  The  proof  technique  used  for 
obtaining  this  sufficient  condition  is  an  application  of  Gersgorin’s  Circle  Theorem,  augmented  with  a  novel  approach 
to  dealing  with  statistical  dependencies. 

The  use  of  Toeplitz  CS  matrices  is  a  desirable  alternative  for  a  number  of  application  areas  because  (i)  IID  CS 
matrices  require  generation  of  O(kn)  independent  random  variables,  which  could  be  particularly  troublesome  for  large- 
scale  applications,  whereas  Toeplitz  CS  matrices  require  generation  of  only  0(n)  independent  random  variables;  (ii) 
multiplication  with  IID  CS  matrices  requires  O(kn)  operations  resulting  in  longer  data  acquisition  and  reconstruction 
times,  while  multiplication  with  a  Toeplitz  CS  matrix  can  be  efficiently  implemented  using  fast  Fourier  transform 
(FFT)  and  consequently  requires  only  0(n  log2(n))  operations;  and  (iii)  Toeplitz-structured  matrices  arise  naturally 
in  certain  application  areas  such  as  identification  of  a  linear  time-invariant  (LTI)  system  and  consequently,  IID  CS 
matrix  results  are  not  applicable  in  such  cases. 

7.3  Organization 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  8,  we  prove  that  a  Toeplitz  matrix  of  the  form  given  in  (8) 
satisfies  RIP  with  high  probability.  In  Section  9,  we  discuss  extensions  of  the  result  of  Section  8  to  circulant  matrices, 
left-shifted  Toeplitz-structured  matrices,  identification  of  LTI  systems  having  sparse  impulse  responses  and  recovery  of 
signals  that  are  sparse  in  some  transform  domain.  In  Section  10,  we  numerically  compare  the  performance  of  Toeplitz 
and  circulant  CS  matrices  to  that  of  IID  ones  and  finally,  in  Section  11,  we  present  some  concluding  remarks. 


8  Main  Result 

The  following  result  quantifies  the  effectiveness  of  Toeplitz  structured  sensing  matrices  [7]. 

Theorem  1.  Let  {cn}?=f~l  be  a  sequence  of  i.i.d.  ±l/\/k  random  variables  taking  each  value  with  probability  1/2. 
When  k  >  c\  •  m2  •  logn,  the  k  x  n  Toeplitz  matrix  (8)  generated  by  this  sequence  satisfies  RIP  of  order  rn  with 
6m  e  (0, 1/3)  with  pmbability  exceeding  1  —  exp(-C2  •  k/m2).  Here,  a  and  C2  are  constants  that  depend  on  6m  but 
not  on  n  or  k. 

Proof  Let  T  C  { 1, 2, . . . ,  n}  be  a  subset  of  indices  of  cardinality  |T|,  and  let  At  be  the  k  x  \T\  submatrix  of  A  formed 
by  retaining  the  columns  indexed  by  the  entries  of  T.  We  need  to  show  that  for  all  subsets  T  satisfying  |T|  =  m,  the 
eigenvalues  of  the  Gram  matrix  G(T)  =  A'tAt  lie  in  the  interval  [1  —  <5, 1  -h  6].  For  a  fixed  subset  7\  this  condition 
can  be  established  using  Gersgorin’s  circle  theorem,  which  states  that  the  eigenvalues  of  an  m  x  m  matrix  G  all  lie 
in  the  union  of  m  discs,  where  the  i-th  disc  is  centered  at  the  diagonal  entry  Gi,i  and  has  radius 

m 

m=  £  ig4J|.  (9) 

i= 

Notice  that  by  choice  of  the  a,  ’s,  G\,,  (T)  =  1  deterministically.  Thus,  to  establish  that  the  eigenvalues  lie  in  [1  -6, 1+  6] 
for  a  fixed  T,  it  is  sufficient  to  show  that  the  off-diagonal  entries  of  G(T)  are  all  less  than  6/m  in  absolute  value, 
since  this  would  imply  R(i)  <  (m  —  1  )(6/m)  <  6  for  all  i. 

To  guarantee  the  RIP  condition,  however,  the  eigenvalue  bounds  must  hold  for  all  subsets  T  that  satisfy  |T|  =  m. 
To  this  end,  we  consider  the  full  n  x  n  Gram  matrix  of  A,  G  =  A' A,  and  show  that  the  off-diagonal  entries  of  G  are 
all  bounded  above  by  6/m  in  absolute  value.  The  implication  is  that,  since  the  Gram  matrix  G(T)  corresponding 
to  any  subset  T  satisfying  |T|  =  m  is  itself  a  submatrix  of  G ,  G(T)  has  bounded  off-diagonals  and,  therefore,  the 
eigenvalues  of  all  (£)  Gram  matrices  G(T)  lie  in  [1  —  6, 1  +  6]. 

To  proceed,  notice  that  each  off-diagonal  term  of  G  is  simply  the  inner  product  between  i-th  and  j-th  column  of 
,4,  and  thus  Gij  =  Gjj.  We  can  write  an  expression  for  the  off-diagonal  element  Gij  as 

k 

Gi,j  =  dn-t+fan—  j  +  t>  (10) 

(= 1 

Standard  concentration  inequalities  are  not  directly  applicable  here  because  all  of  the  entries  in  the  sum  are  not 
mutually  independent.  For  example,  consider  i  =  n  —  1,  j  =  n,  and  k  =  4.  Then  Gn-i,n  =  0102  4-  0203  +  0304  -P  <*405 
and  the  first  two  terms  are  dependent  (through  02),  as  are  the  second  and  third  (through  <13),  etc.  But  notice  that 
the  first  and  third  terms  are  independent  as  are  the  second  and  fourth.  Overall  this  sum  may  be  split  into  two  sums 
of  i.i.d.  random  variables,  where  each  component  sum  is  formed  simply  by  grouping  alternating  terms.  The  number 
of  terms  in  each  sum  is  either  the  same  (if  A;  is  even)  or  differs  by  one  if  k  is  odd. 
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In  fact  this  decomposition  is  possible  for  every  Gi,j,  and  this  observation  provides  the  key  to  tolerating  the 
dependencies  that  arise  from  the  structure  in  the  sensing  matrix.  Note  that  the  terms  in  any  such  sum  are  each 
dependent  with  at  most  two  other  terms  in  the  sum.  Thus,  each  sum  can  be  rearranged  such  that  the  dependent 
terms  are  “chained”  -  that  is,  the  ^-th  (rearranged)  term  is  dependent  with  (at  most)  the  (£  -  l)-st  term  and  the 
((  -f  l)-st  terms.  This  rearranged  sum  has  the  same  structure  as  the  example  above,  and  can  be  split  in  a  similar 
fashion  simply  by  grouping  alternating  terms. 

When  k  is  even,  each  sum  can  be  decomposed  as 


91  =  5  92  =  5 

G'j  =  yi  yi  9t 

£  =  i  £= i 

where  gc  and  g'(  denote  the  rearranged  and  reindexed  terms  (which  are  now  ±l/k  random  variables),  while 


(ID 


91  =  V  92=^ 

9(  +  V  9e  (12) 

r=i  e=\ 

when  k  is  odd.  Generically,  we  write  Gij  =  Gjj  +G?j.  We  analyze  each  component  sum  using  Hoeffding's  (two-sided) 
inequality  for  bounded  random  variables  to  obtain,  for  example, 

Pr(|Glil>«)<2exp(^p),  (13) 

and  choosing  c  =  5 /2m  yields 

Pr(|G,'j|  >  S/2m)  <  2exp  .  (14) 

Considering  both  sums,  we  can  write 

Pr  (|Gt,j|  >  S/m) 

<  Pr  ({|G}j|  >  6 /2m}  or  {|G?j|  >  <S/2m}) 

<  2max{Pr(|G,1,j|  >  <J/2m)  ,Pr  (|G?j|  >  <5/2m) } 

<  2max{2exp(^),2exp(^)}.  (15) 

Notice  that  smaller  values  of  q\  and  q 2  lead  to  tighter  bounds,  and  thus  the  slowest  rate  of  concentration  occurs  when 
the  number  of  nonzero  terms  in  Gi,j  is  largest.  This  occurs  when  k  is  odd,  and  <72  =  (k  -f  l)/2.  Using  the  (loose) 
upper  bounds  qi  <  q 2  <  k,  we  obtain 


Pr(|G<o|  >  S/m)  <  4exp  (-g~r)  •  (16) 

To  establish  RI1)  we  require  that  each  of  the  n(n—  1  )/2  unique  off-diagonal  terms  Gi,j  satisfy  this  bound.  Applying 
the  union  bound  yields 

Pr(any  |Gi,j|  >  <5/m)  <  4n2exp(^-^|j 

-  exp(i^ +3logn)-  (17) 

where  the  last  step  follows  under  the  mild  assumption  that  n  >  4.  Now,  notice  that  whenever  62k/8m2  >  3log7i,  or 
k  >  *4”ls j°s T1 ,  RIP  is  satisfied  with  probability  at  least 

1_exp(i^+3l0gp)-  (18) 

This  success  probability  is  nonzero  and  can  be  very  close  to  one  when  k  is  large  compared  to  m2.  □ 

Before  discussing  natural  extensions  to  Toeplitz  CS  matrices,  it  is  instructional  to  compare  the  result  of  Theorem  1 
with  that  for  IID  CS  matrices.  Specifically,  previous  work  has  shown  that  IID  CS  matrices  generated  from  certain 
distributions  satisfy  RIP  of  order  3 m  for  every  63m  €  (0, 1/3)  with  probability  >  1  —e~c*k  provided  k  >  c\  m  In (n/m), 
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where  c\,C2  >  0  are  constants  depending  only  on  63m  -  see,  e.g.,  [3,6].  It  might  be  tempting,  therefore,  to  conclude 
that  reduction  in  the  number  of  DoFs  of  a  Toeplitz  matrix  from  O(kn)  to  0(n)  results  in  a  factor  of  0(m)  increase 
in  the  required  number  of  observations.  One  needs  to  apply  caution,  however,  as  Theorem  1  bounds  the  worst  case 
performance  of  Toeplitz  CS  matrices  for  all  ?n-sparse  signals  and  it  might  very  well  be  that  this  oversampling  is  not 
required  for  most  signals  in  the  class.  Extensive  simulations  carried  out  for  a  number  of  m-sparse  signals  using  I  ID 
and  Toeplitz  matrices  of  equal  dimensions,  in  fact,  support  this  intuition.  It  is  also  interesting  to  note  that  somewhat 
similar  numerical  results  (without  any  performance  guarantees)  have  been  reported  in  [8]  in  the  context  of  random 
filters. 


9  Extensions 

In  this  section,  we  discuss  natural  extensions  of  the  result  of  Section  8  to  circulant  and  left-shifted  Toeplitz-structured 
matrices.  Further,  we  also  describe  how  the  results  for  Toeplitz-structured  CS  matrices  lend  themselves  to  (i) 
identification  of  LTI  systems  having  sparse  impulse  responses;  and  (ii)  recovery  of  signals  that  are  either  piecewise 
constant  (PWC)  or  sparse  in  the  Haar  wavelet  domain. 


9.1  Circulant  CS  Matrices 

Theorem  2.  Suppose  that  n,  m  are  given ,  and  let  A  be  a  k  x  n  (partial)  circulant  matrix  of  the  form 

Q>n  an- 1  •••  Cl  2  O  l 

Cil  an  ...  03  02 

A  =  ....... 

.CLk- 1  Ok-  2  .  Ok_ 


(IS) 


where  the  entries  {u,}>- 1  are  ±1  /Vk  each  with  probability  1/2.  Then,  there  exist  constants  c" , c?  >  0  depending  only 
on  63m  such  that  for  any  k  >  c"  m2  ln(n),  A  satisfies  RIP  of  order  3m  for  every  63m  6  (0, 1)  with  probability  at  least 

1  _c-4VmJ  (20) 

Sketch  of  Proof.  The  same  proof  applies  here  as  in  the  Toeplitz  case,  as  the  dependency  structure  among  columns  is 
the  same  as  in  the  original  setting.  □ 


9.2  Left-shifted  Toeplitz  and  Circulant  CS  Matrices 

The  results  of  Theorem  1  and  2  apply  equally  well  to  left-shifted  Toeplitz  and  circulant  matrices  of  the  form 


’a  i 

02 

On-  1 

On 

02 

a3 

On 

dn  + 1 

/  / 

.Ok 

On+k-2 

On  +  k 

and 


a  1 

02  ... 

tin  — 1 

On 

02 

C13 

On 

o\ 

/  / 

Ok 

dk-2 

CLk- 1. 

because  the  dependency  structures  among  columns  are  the  same  as  the  original  case. 


(21) 


(22) 


9.3  System  Identification 

The  area  of  estimation  of  the  impulse  response  of  an  LTI  system  from  the  knowledge  of  its  input  and  output 
signals,  commonly  termed  as  system  identification,  is  of  considerable  importance  in  signal  processing  because  of  its 
applicability  to  a  wide  range  of  problems  -  see,  e.g.,  [9, 10].  In  the  case  of  a  finite  impulse  response  (FIR)  LTI  system, 
this  typically  involves  probing  the  system  with  a  (known)  white  noise  sequence  of  duration  orders  of  magnitude 
greater  than  that  of  the  impulse  response  [11],  which  may  be  prohibitive  because  of  the  delay  incurred  in  solving  for 
the  impulse  response  and  the  difficulty  of  generating  a  truly  white  noise  sequence.  For  the  purposes  of  deconvolving 
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an  LTI  system  having  a  sparse  impulse  response,  however,  a  more  promising  alternative  is  to  appeal  to  the  results  of 
Section  8. 

As  an  illustration,  let  x[£]  be  an  m-sparse  impulse  response  of  an  LTI  system  (of  duration  n)  and  a[f]  be  an  111) 
sequence  of  duration  (n  +  k  -  1)  that  has  been  drawn  from  one  of  the  probability  distributions  given  in  (??).  Then, 
probing  the  given  system  with  a[£]  yields  y[£]  =  a[£]  *  x[£]  and  the  theory  of  CS  along  with  Theorem  1  guarantees 
that,  with  high  probability,  rr[^]  can  be  exactly  recovered  by  solving  the  convex  program 


z[£]  =  arg 


INI. 


subject  to  y 


where,  in  this  case,  y  = 


y(n  -  1] 

“ 

vW 

,  and 

J/[n  +  k  - 

2], 

a[n  —  1]  a[n  —  2] 

..  a[l]  a[0] 

a[n ]  a[n  —  1] 

. .  a [2]  a[l] 

A  = 

a[n  -f  k  —  2]  a[n  -P  k  -  3] 

(23) 


9.4  Beyond  Sparse  Signals 

We  have  proven  above  that  Toeplitz  (and  circulant)  matrices,  having  entries  drawn  independently  from  probability 
distributions  that  yield  IID  CS  matrices,  satisfy  RIP  of  order  3m  with  high  probability.  Often,  we  are  interested 
in  signals  that  are  sparse  in  some  transform  domain  'P  /  /,  i.e.,  x  =  ^0  and  6  €  Rn  is  m-sparse,  in  which  case  it 
is  required  that  the  product  matrix  A*F  satisfies  RIP  of  order  3m  for  successful  recovery  of  0  (and  hence  x).  This 
is  indeed  the  case  when  A  happens  to  be  an  IID  CS  matrix  and  ^  is  any  orthonormal  basis  [6].  Toeplitz  matrices, 
however,  seem  to  lack  this  universality  property  because  of  their  highly  structured  nature.  Nevertheless,  the  results 
of  Section  8  can  still  be  leveraged  to  design  CS  matrices  for  fixed  transformations  to  retain  some  of  the  benefits  of 
Toeplitz-structured  CS  matrices  such  as  generation  of  only  0(n)  independent  random  variables,  and  faster  acquisition 
and  reconstruction  algorithms. 

As  an  illustration,  let  x  be  an  m-piece  PWC  signal;  such  a  signal  can  be  written  as  x  =  LO,  where  0  €  Rn  is 
m-sparse  and  L  €  Rnxn  -  the  discrete  integral  transform  -  is  given  by 

fl  0  ...  (T| 


1  0 

.1  ...  1  1. 


Further,  let  {a,  1  be  a  sequence  of  independent  ±l/\/k  random  variables  and  Al  €  Rfcxn  be  the  cascade  of  a 

k  x  n  Toeplitz  matrix  A  and  the  n  x  n  differencing  operator 


that  is, 


D 


"10 

-1  1 

**.  0 
-1  1. 


Al  = 


(fln  an— i ) 
(fln  +  1  An ) 


(fl2  —  a  i)  a  i 

(a3  —  02)  A2 


_(fln  +  fc-  1  —  Cin  +  k  2) 


(flfc+i  —  a/t)  a*. 


(25) 


(26) 


Then,  by  construction,  (i)  Al  has  only  (n  +  k—  1)  DoFs;  (ii)  multiplication  with  Al  =  AD  requires  only  0(7*  log.2(r?)) 
operations;  and  (iii)  the  product  matrix  AlL  =  ADL  =  A  is  a  Toeplitz  CS  matrix  and  consequently,  satisfies  RIP 
with  high  probability.  Likewise,  if  x  happened  to  be  m-sparse  in  the  Haar  wavelet  domain,  i.e.,  (the  inverse 

Haar  wavelet  transform  matrix),  then  a  CS  matrix  of  the  form  Aw  =  AW  would  also  have  these  three  properties. 
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Figure  2:  Empirical  probability  of  success  as  a  function  of  number  of  observations  k  (n  =  2048,  m  =  20). 


10  Numerical  Results 

In  this  section,  we  numerically  compare  the  performance  of  Toeplitz  and  circulant  CS  matrices  to  that  of  IID  ones. 
The  experimental  setup  involves  generating  a  length  n  =  2048  signal  with  randomly  placed  m  =  20  non-zero  entries 
drawn  independently  from  A/"(0, 1).  Each  such  generated  signal  is  sampled  using  kxn  IID,  Toeplitz  and  circulant  ma¬ 
trices  with  entries  drawn  independently  from  the  Bernoulli  =  {  +  with  probability  with  probability  -} 

distribution  and  reconstructed  using  the  gradient  projection  algorithm  described  in  [5],  where  matrix  multiplications 
are  carried  out  using  FFT  in  the  case  of  Toeplitz  and  circulant  observation  matrices.  Success  is  declared  if  the 
algorithm  exactly  recovers  the  signal  (taking  into  account  machine  precision  errors),  and  the  empirical  probability 
of  success  for  each  value  of  k  is  determined  by  repeating  this  process  1000  times  and  calculating  the  fraction  of 
successes.  While  running  this  experiment  for  all  x  €  Rn  or  even  all  (2^8)  unique  sparsity  patterns  does  not  seem 
possible,  simulation  results  show  that  for  a  large  number  of  synthesized  signals  (and  for  the  reasons  described  earlier), 
Toeplitz  and  circulant  matrices  perform  as  well  as  IID  ones  in  terms  of  the  empirical  probability  of  success.  We  plot 
the  empirical  probability  of  success  versus  number  of  observations  k  for  one  such  signal  in  Fig.  2. 


11  Conclusions 

In  this  part  of  the  final  report,  we  have  shown  that  Toeplitz-structured  matrices  with  random  entries  drawn  inde¬ 
pendently  from  a  certain  probability  distribution  are  also  sufficient  to  recover  undersampled  sparse  signals.  The  use 
of  such  matrices  is  a  desirable  alternative  for  a  number  of  application  areas  because  it  greatly  reduces  the  computa¬ 
tional  and  storage  complexity  in  large-dimensional  problems.5 *  The  result  presented  here  can  be  extended  to  random 
Toeplitz  matrices  with  entries  are  drawn  from  other  distributions  (such  as  zero-mean  Gaussian)  using  similar  proof 
techniques. 
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Part  III 

Some  Comparisons  of  NoLAfF  and 
Randomized  Projection  approaches  and 
Future  Directions 


There  are  several  future  research  directions  we  are  considering.  One  such  direction  is  comparing  the  performance  of 
“traditional”  CS  techniques  that  rely  on  a  single  input  channel  using  randomized  projections  and  other  techniques 
such  as  the  Nyquist  Folding  receiver  (NYFR)  with  multiple  channels  or  the  Nonlinear  Affine  (NoLAff)  receiver.  The 
next  section  describes  some  of  the  obvious  differences  of  the  various  techniques  from  a  sparseness  pattern  perspective. 
One  approach  to  analyzing  such  approaches  (e.g.,  NYFR  and  NoLAff)  is  based  on  having  two  channels.  One  which 
is  simply  an  undersampling  of  the  received  signal  and  the  second  which  is  encoded  either  via  a  sensing  matrix  or  an 
approximation  to  one. 

This  two  channel  analysis  for  A-to-I  may  help  with  examining  the  limits  of  single  channel  NoLAff  in  the  sense 
that  that  single  channel  contains  both  a  linear  (undersampled)  stream  of  data  as  well  as  a  nonlinear  affine  stream  of 
data,  which  in  turn  is  used  to  remove  the  ambiguity. 

We  then  compare  the  NoLAff  approach  to  L1-L2  techniques  from  an  encoding  matrix  point  of  view.  This  provides 
further  context  for  choosing  non-random  or  structured  forms  of  encoding. 

12  A  comparison  of  Undersampling  Approaches  From  A  Sparse¬ 
ness  Pattern  Perspective 

A  potentially  useful  way  to  compare  and  analyze  various  undersampling  approaches  is  based  on  examining  the  various 
approaches  from  a  sparseness  pattern  perspective.  Various  algorithms  treat  the  issue  of  sparseness  pattern  differently 
and  consequently,  regardless  of  actual  implementation  issues  and  indeed  algorithm  specifics,  we  can  intuitively  under¬ 
stand  the  differences  in  performance  bounds  and  sensitivity  between  these  approaches.  In  the  following  we  examine 
the  problem  of  sparseness  patterns  and  undersampling  for  several  approaches  including:  standard  compressive  sensing 
approaches  which  assume  nothing  about  such  patterns  (e.g.,  basis  pursuit  in  its  various  incarnations),  NoLAff,  and 
Variable  Pivjection  and  Unfolding  (VPU). 

We  start  with  the  standard  formulation  of  the  problem  where  we  have  n  samples  (possibly  samples  per  unit 
time,  or  per  unit  space,  etc.)  of  a  signal  x  which  in  described  fully  in  some  known  decomposition  $  (basis  or  much 
larger  dictionary)  with  only  k  non-zero  coefficients.  That  is  k  represents  the  information  content  (or  possibly  the 
information  rate,  or  information  density).  We  Note  that  we  are  ignoring  for  the  sake  of  this  discussion  important 
issues  such  as  the  number  of  bits  with  which  these  n  samples  and  k  coefficients  must  have  to  accurately  recover  the 
signal  x.  One  of  the  fundamental  question  in  the  A-to-I  program  is:  how  few  samples  m  can  we  use  to  accurately 
reconstruct  x?  Further  we  would  like  to  get  an  intuitive  feeling  for  the  cost  of  reconstructing  x  from  these  in  samples 
and  what  sensitivities  do  reconstruction  algorithms  have. 

We  note  that  the  m  samples  are  taken  from  y  rather  than  from  x  where  y  is  some  transformation  of  x  that  ensures 
that  all  necessary  information  about  x  is  highly  likely  to  be  present  in  the  samples  of  y.  The  transformation  in  question 
is  often  represented  in  the  form  of  a  “random”  projection  of  x,  say  y  =  'I'x  where  i\)  is  assumed  to  be  known.  In  the 
case  of  NoLAff  the  transformation  is  not  such  a  linear  operator  but  rather  a  nonlinear  and  affine  transformation; 
however,  the  conditions  on  the  NoLAff  encoder  are  similar  in  spirit  to  those  of  the  “random”  projection. 

Without  a  so  called  “genie”  aided  solution  where  we  know  a  priori  which  k  coefficients  are  non-zero  we  have  at 
least  (£)  possible  combinations  of  non-zero  coefficient  choices6.  These  are  the  sparseness  patterns  we  seek;  that  is  we 
wish  to  know  the  location  of  the  k  non-zero  coefficients  as  well  of  course  as  their  value.  In  some  cases  knowing  just 
the  location  is  sufficient  such  as  in  detection  settings. 

12.0.1  The  trivial  solution 

Having  described  the  problem  at  hand  in  these  term  we  examine  some  potential  solutions.  The  trivial  approach  is  to 
search  all  (£)  combinations  and  to  minimize  the  error  between  the  observations  y  and  the  transformed  signal  x.  For 

6In  the  case  of  dict  ionaries  that  are  made  say  of  multiple  bases  we  may  have  a  significantly  larger  number  of  sparseness 
patterns  to  choose  from. 
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example  we  could  choose  to  solve  the  L2,  Lo  problem 


min  \\y  -  s.t.  ||s||0  <  k  (27) 

all  sparseness  patterns 

where  s  represents  the  A-sparse  signal  of  the  dictionary’s  coefficients.  Equivalently  in  the  truly  noiseless  case  we  can 
reformulate  this  as 

min  ||s||0  s.t.  y  =  (28) 

Needless  to  say,  this  approach  is  computationally  intractable  and  indeed  solving  it  as  such  has  combinatorial  com¬ 
plexity  that  is  irreducible  in  general.  However,  were  we  to  choose  this  as  our  algorithm  for  reconstruction  we  could 
have  m  >  k  assuming  the  transformation  'I'  is  chosen  appropriately. 

12.0.2  Standard  compressive  sensing  approaches 

A  far  superior  class  of  approaches  is  derived  from  the  insight  which  shown  that  L2,  Lo  problems  such  as  that  stated 
above  can  under  some  conditions  be  solved  exactly  using  a  L,2,L\  problem.  For  example  we  replace  (28)  with 

min||s||j  s.t.  y  =  (29) 

Here  we  have  a  linear  programing  problem  which  is  inherently  a  low  complexity  one.  Or  in  the  noisy  case  we  may 
solve  convex  optimization  problems  such  as 


min{||y-'I'<I>s||*  +  A||s||1}  (30) 

which  are  also  tractable.  However,  since  we  do  not  know  the  sparseness  pattern  of  s  we  must  include  more  samples 
than  merely  those  that  represent  the  value  of  the  non-zero  entries.  We  must  include  enough  samples  to  also  decode 
the  location  of  those  k  samples.  Indeed  to  encode  these  k  position  among  at  least  n  locations  we  require  something 
like  m  >  ck\og(n).  Where  c  is  a  constant  that  depends  on  many  things  including  the  algorithm’s  specifics  which  we 
ignore  for  the  present  discussion.  The  point  however  is  that  we  are  paying  a  price  for  not  knowing  the  sparseness 
pattern. 


12.0.3  NoLAff 

The  NoLAff  approach  provides  a  transformation  (encoding)  and  a  reconstruction  algorithm  (decoding)  that  does 
not  lose  all  the  sparseness  pattern  information  (unlike  standard  CS  approaches),  indeed  in  NoLAff  following  a  mild 
nonlinear  affine  transform  which  retains  much  of  the  original  signal  x  we  undersample  by  a  factor  of  n/m.  That  is 
we  retain  m  samples  in  which  the  k  basis  vectors  of  interest  are  present  up  to  the  obvious  n/m- ary  ambiguity  (due  to 
aliasing).  The  residual  nonlinear  component  of  the  m  samples  contains  enough  information  to  resolve  the  ambiguity 
since  each  of  the  n/m  Nyquist  zones  is  associated  with  a  different  (and  known)  nonlinear  and  affine  transformation 
characteristic.  By  solving  the  resulting  m/n  hypothesis  testing  problem  k  times  we  reconstruct  the  signal  x.  Since 
we  did  not  lose  the  sparseness  pattern  using  NoLAff  we  can  have  as  few  as  m  =  k  samples  (much  like  the  trivial 
approach  above).  We  note  however  that  solving  trivial  hypothesis  testing  problems  and  undoing  the  simple  know 
nonlinearities  are  very  low  complexity  operations.  Hence  we  have  the  best  of  both  the  trivial  approach  and  the 
standard  OS  approaches. 

12.0.4  VPU 

Finally,  we  discuss  VPU  in  the  terms  of  sparseness  patterns.  Without  going  into  too  much  algorithmic  detail  we 
can  describe  VPU’s  approach  as  follows:  1)  scan  all  the  rank  one  subspaces  (in  4>)  for  the  “best”  ones  and  keep  the 
“winners”,  2)  scan  all  the  contiguous  rank  2  subspaces  (in  <£)  for  the  “best”  ones  and  keep  the  “winners’,  3)  repeat 
for  higher  rank  contiguous  subspaces.  We  of  course  go  no  higher  than  rank  m  signals.  This  approach  has  many 
advantages  as  well  as  some  significant  disadvantages,  principally  computational  complexity. 

We  can  think  of  VPU  as  a  “greedy”  algorithm  that  chooses  (suboptimally)  the  best  subspace  found  so  far 
containing  signals,  by  enumerating  the  various  combinations  in  a  given  order.  This  allows  us  to  utilize  any  prior 
information  about  the  likelihood  of  particular  signal  subspaces  appearing  in  the  received  signal  of  interest.  It  is  clear 
therefore  why  VPU  has  high  computational  complexity  relative  to  other  CS  algorithms  while  at  the  same  time  having 
superior  reconstruction  accuracy  when  one  chooses  the  order  of  subspace  enumeration  intelligently. 
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12.0.5  Some  Additional  Comparative  Remarks 

We  note  that  due  to  their  nature  both  NoLAff  and  VPU  have  some  additional  robustness  to  noise  compared  with 
standard  CS  approaches.  In  addition  NoLAfF  has  much  better  robustness  to  large  dynamic  range  differences  between 
signal  components.  A  further  distinction  that  NoLAff  has  is  that  it  clearly  does  not  require  ant  Nyquist  rate  switching 
circuits  in  its  encoder. 

While  VPU  has  a  significant  computational  complexity  penalty  we  note  that  the  idea  of  exploiting  additional 
information  about  the  signal;  space  is  one  which  should  indeed  be  considered  carefully.  Whether  any  additional 
information  about  the  sparseness  pattern  is  known  a  priori  or  adaptively  we  assume  that  we  can  get  superior  recon¬ 
struction  of  the  signal  x  using  that  information.  It  should  be  pointed  out  that  VPU  does  have  the  ability  to  treat 
any  sparseness  pattern  (not  just  those  it  explicitly  scans  over)  in  the  sense  that  scanning  through  all  the  rank  r> 
subspaces  does  indeed  allow  any  pattern  to  exist.  However,  if  VPU  stopped  there  it  would  be  essentially  equivalent 
to  orthogonal  matched  pursuit  techniques  (if  using  rank  1  subspace)and  would  suffer  from  the  same  limitations. 

We  note  that  these  algorithmic  approaches  are  therefore  qualitatively  different  from  the  class  of  standard  CS  tech¬ 
niques  that  must  recover  the  sparseness  pattern  without  encoding  it  (as  in  NoLAff)  or  making  additional  assumptions 
(as  in  VPU). 


13  A  NoLAff  Comparison  to  L1-L2  Sparse  Reconstruction 

A  special  case  of  nonlinear  sensing  involves  nonlinear  analog  encoding  in  the  presence  of  a  strong  well  defined  signal 
p  (e.g.,  a  probe  signal  injected  additionally  into  the  receiver  stream). 

Let  x  be  an  input  signal  to  a  receiver  and  let  a  dictionary  matrix  T  be  assembled,  where  the  columns  span  the 
vector  space  of  input  signals 


x  =  T  0,.  (31) 

The  vector  0  is  referred  to  as  the  information  vector. 

The  signal  x,  is  passed  through  a  nonlinear  system  NoLAff(),  producing  output 


f(x)  =  g(x  +  p)-g(p),  (32) 

where  p  represents  the  probe  signal  and  the  function  g(-)  implements  the  nonlinearity 

00 

g  (■)  =  y,mY-  (33) 

1=1 

WLOG,  g  here  is  memory  less.  Element-wise  multiplication  and  exponentiation  are  denoted  with  a  •  where  appropriate. 
The  output  of  the  NoLAff  function  is  approximately  linear  wrt  the  input  when  , 


In  this  case, 


f(x) 


Therefore, 


f(x) 


where  Nl  =  ak  ■  diag(p)u'  *>. 


IIPlI  »  ||x|| . 


g(x  +  p)  -g(p) 

OO  OO 

Y  m* + p)*  -  Y  amP m 

fc=  1  m=l 

y  a*x.  *  p(fc_i). 

/c= i 


•  P  (fc  l)^j  *x 


£«cd iag(p)(*  l) 


=  Nlx 


(34) 


(35) 


(36) 
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Let  the  measurement  model  to  be 


y  =  *f(T0) 

%  $NlT  e. 

Hence,  the  calculations  for  the  convexity  of  the  LASSO  type  cost  function  are  simplified.  Starting  from 

iW  =  lly-*NLT0|g  +  A||«||1, 

and 

J(0)  =  ||y-*NLT0|g, 

the  gradient  is 

VJ(9)  =  ^J(0) 

=  2Tf,NLH^"^NLT«-2yH^NLT, 


(37) 


(38) 

(39) 


(40) 


and  therefore  the  Hessian  is 


V2j(0)  =  ^VJ(<?) 

=  2THNL"*H4>NLT.  (41) 

Equation  (41)  is  positive  semi-definite  therefore  the  cast  function  is  convex  for  the  NoLAfF  modulation. 

13.0.6  Right-Side  Factorization 

An  interesting  alternate  NoLAfF  derivation  is  the  right  hand  decomposition  which  creates  an  output  signal  dictionary. 
Thus  both  the  input  and  the  output  are  sparsely  represented  in  their  respective  dictionaries.  We  note  that  the  output 
signal  is  not  sparse  in  the  input  dictionary  and  hence  can  be  decoded  from  an  undersampled  representation,  however 
it  is  sparse  in  the  new  dictionary.  T  is  a  nonlinear  affine  transformation  of  T.  The  complex  coefficients  which  describe 
how  the  NoLAfF  function  spreads  information  from  dictionary  T  into  the  dictionary  T  can  be  collected  into  an  n  x  n 
matrix  Nr,  such  that 

TNr  =  T.  (42) 

We  note  that  the  output  dictionary  is  a  function  of  our  choice  of  probe  signal. 

One  can  depict  the  notion  of  CS  via  a  NoLAfF  inspired  sensing  matrix  as  in  Figure  3.  Here  the  ADC  is  preceded 
by  a  nonlinear  device  that  contains  a  linear  pass-through  and  a  third  order  nonlinearity.  The  input  into  the  receiver 
is  of  course  augmented  by  a  tonal  probe  and  standard  assumptions  of  relatively  weak  nonlinearities  and  a  very  strong 
probe  holds.  We  choose  here  a  DFT  matrix  as  the  dictionary. 

It  is  clear  that  the  sensing  matrix  here  has  retained  much  of  the  orthogonality  of  the  input  dictionary  in  the 
output.  And  therefore  we  can  easily  conceive  of  other  decoding  schemes  that  do  not  require  convex  optimization 
with  all  its  drawbacks.  In  particular  we  note  that  the  issue  of  dynamic  range  of  input  signals  is  one  that  is  inherently 
limited  by  the  L1-L2  optimization  approach.  While  here  there  is  very  little  overlap  between  various  signal  components 
in  the  output.  Pictorially  these  can  be  seen  as  the  yellow  squares  in  Figure  3.  We  note  that  the  main  diagonal  is 
directly  representing  the  linear  passthrough  component. 

We  further  note  that  such  an  encoding  based  on  NoLAfF  provides  a  sliding  scale  of  signal  spreading  from  the  one 
just  described  in  which  much  of  the  input  orthogonality  and  sparseness  is  preserved  to  an  almost  random  one.  One 
just  has  to  add  several  more  probe  signals  as  depicted  in  Figure  4. 
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14  A  summary  and  Some  Final  Thoughts 

In  all  the  approaches  to  compressive  sensing  explored  in  this  work  there  are  some  common  threads  that  relate  to 
moving  away  from  purely  random  encoding  of  undersampled  data.  In  the  Toeplitz  structured  sensing  matrix  approach 
described  above  we  have  shown  that  having  a  highly  structured  (and  hence  a  practically  efficient)  sensing  matrix  is 
possible  without  degradation  of  reconstruction  performance.  The  adaptive  approach  described  above  starts  from  a 
randomized  sensing  matrix  which  is  a  “democratic”  approach  but  quickly  utilizes  the  partial  information  gathered 
with  each  sample  to  move  adaptively  to  a  data  dependent  sensing  matrix.  The  comparisons  of  randomized  projection 
approaches  (e.g.,  L2-L1  techniques)  to  NoLAff  provides  vet  another  take  on  this  theme  of  moving  away  from  purely 
randomized  projections.  While  NoLAff  does  not  strictly  speaking  use  a  sensing  matrix  it  nonetheless  can  be  shown 
to  have  nearly  equivalent  encoding  structures  that  can  be  described  in  the  quasi  linear  approximation  cases  as  a 
deterministic  sensing  matrix.  This  approach  in  particular  allows  us  to  move  away  from  the  convex  optimization 
decoding  approaches  to  a  hypothesis  testing  approach  which  has  been  shown  to  be  highly  efficient  from  a  data  rate 
perspective;  essentially  allowing  innovations  rate  sampling. 
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